{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluation, Cross-Validation, and Model Selection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### By Heiko Strathmann - [heiko.strathmann@gmail.com](mailto:heiko.strathmann@gmail.com) - http://github.com/karlnapf - http://herrstrathmann.de.\n",
    "Based on the model selection framework of his [Google summer of code 2011 project](http://www.google-melange.com/gsoc/project/google/gsoc2011/XXX) |  Saurabh Mahindre - [github.com/Saurabh7](https://github.com/Saurabh7) as a part of [Google Summer of Code 2014 project](http://www.google-melange.com/gsoc/project/details/google/gsoc2014/saurabh7/5750085036015616) mentored by - Heiko Strathmann "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook illustrates the evaluation of prediction algorithms in Shogun using <a href=\"http://en.wikipedia.org/wiki/Cross-validation_(statistics)\">cross-validation</a>, and selecting their parameters using <a href=\"http://en.wikipedia.org/wiki/Hyperparameter_optimization\">grid-search</a>. We demonstrate this for a toy example on <a href=\"http://en.wikipedia.org/wiki/Binary_classification\">Binary Classification</a> using <a href=\"http://en.wikipedia.org/wiki/Support_vector_machine\">Support Vector Machines</a> and also a [regression](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLOOCrossValidationSplitting.html) problem on a real world dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. [General Idea](#General-Idea)\n",
    "2. [Splitting Strategies](#Types-of-splitting-strategies)\n",
    "  1. [K-fold cross-validation](#K-fold-cross-validation)\n",
    "  2. [Stratified cross-validation](#Stratified-cross-validation)\n",
    "3. [Example: Binary classification](#Toy-example:-Binary-Support-Vector-Classification)\n",
    "4. [Example: Regression](#Regression-problem-and-cross-validation)\n",
    "5. [Model Selection: Grid Search](#Model-selection-using-Grid-Search)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## General Idea"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cross validation aims to estimate an algorithm's performance on unseen data. For example, one might be interested in the average classification accuracy of a Support Vector Machine when being applied to new data, that it was not trained on. This is important in order to compare the performance different algorithms on the same target. Most crucial is the point that the data that was used for running/training the algorithm is not used for testing. Different algorithms here also can mean different parameters of the same algorithm. Thus, cross-validation can be used to tune parameters of learning algorithms, as well as comparing different families of algorithms against each other. Cross-validation estimates are related to the marginal likelihood in Bayesian statistics in the sense that using them for selecting models avoids overfitting."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Evaluating an algorithm's performance on training data should be avoided since the learner may adjust to very specific random features of the training data which are not very important to the general relation. This is called [overfitting](http://en.wikipedia.org/wiki/Overfitting). Maximising performance on the training examples usually results in algorithms explaining the noise in data (rather than actual patterns), which leads to bad performance on unseen data. This is one of the reasons behind splitting the data and using different splits for training and testing, which can be done using cross-validation.\n",
    "Let us generate some toy data for binary classification to try cross validation on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%pylab inline\n",
    "%matplotlib inline\n",
    "# include all Shogun classes\n",
    "import os\nSHOGUN_DATA_DIR=os.getenv('SHOGUN_DATA_DIR', '../../../data')\n",
    "from shogun import *\n",
    "# generate some ultra easy training data\n",
    "gray()\n",
    "n=20\n",
    "title('Toy data for binary classification')\n",
    "X=hstack((randn(2,n), randn(2,n)+1))\n",
    "Y=hstack((-ones(n), ones(n)))\n",
    "_=scatter(X[0], X[1], c=Y , s=100)\n",
    "p1 = Rectangle((0, 0), 1, 1, fc=\"w\")\n",
    "p2 = Rectangle((0, 0), 1, 1, fc=\"k\")\n",
    "legend((p1, p2), [\"Class 1\", \"Class 2\"], loc=2)\n",
    "\n",
    "# training data in Shogun representation\n",
    "features=RealFeatures(X)\n",
    "labels=BinaryLabels(Y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Types of splitting strategies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As said earlier Cross-validation is based upon splitting the data into multiple partitions. Shogun has various strategies for this. The base class for them is [CSplittingStrategy](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CSplittingStrategy.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### K-fold cross-validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Formally, this is achieved via partitioning a dataset $X$ of size $|X|=n$ into $k \\leq n$ disjoint partitions $X_i\\subseteq X$ such that $X_1 \\cup X_2 \\cup \\dots \\cup X_n = X$ and $X_i\\cap X_j=\\emptyset$ for all $i\\neq j$. Then, the algorithm is executed on all $k$ possibilities of merging $k-1$ partitions and subsequently tested on the remaining partition. This results in $k$ performances which are evaluated in some metric of choice (Shogun support multiple ones). The procedure can be repeated (on different splits) in order to obtain less variance in the estimate. See  [1] for a nice review on cross-validation using different performance measures."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "k=5\n",
    "normal_split=CrossValidationSplitting(labels, k)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Stratified cross-validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On classificaiton data, the best choice is [stratified cross-validation](http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29#Common_types_of_cross-validation). This divides the data in such way that the fraction of labels in each partition is roughly the same, which reduces the variance of the performance estimate quite a bit, in particular for data with more than two classes. In Shogun this is implemented by [CStratifiedCrossValidationSplitting](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CStratifiedCrossValidationSplitting.html) class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "stratified_split=StratifiedCrossValidationSplitting(labels, k)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Leave One Out cross-validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Leave One Out Cross-validation](http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29#Leave-one-out_cross-validation) holds out one sample as the validation set. It is thus a special case of K-fold cross-validation with $k=n$ where $n$ is number of samples. It is implemented in [LOOCrossValidationSplitting](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLOOCrossValidationSplitting.html) class."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us visualize the generated folds on the toy data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "split_strategies=[stratified_split, normal_split]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#code to visualize splitting\n",
    "def get_folds(split, num):\n",
    "    split.build_subsets()\n",
    "    x=[]\n",
    "    y=[]\n",
    "    lab=[]\n",
    "    for j in range(num):\n",
    "        indices=split.generate_subset_indices(j)\n",
    "        x_=[]\n",
    "        y_=[]\n",
    "        lab_=[]\n",
    "        for i in range(len(indices)):\n",
    "            x_.append(X[0][indices[i]])\n",
    "            y_.append(X[1][indices[i]])\n",
    "            lab_.append(Y[indices[i]])\n",
    "        x.append(x_)\n",
    "        y.append(y_)\n",
    "        lab.append(lab_)\n",
    "    return x, y, lab\n",
    "    \n",
    "def plot_folds(split_strategies, num):\n",
    "    for i in range(len(split_strategies)):\n",
    "        x, y, lab=get_folds(split_strategies[i], num)\n",
    "        figure(figsize=(18,4))\n",
    "        gray()\n",
    "        suptitle(split_strategies[i].get_name(), fontsize=12)\n",
    "        for j in range(0, num):\n",
    "            subplot(1, num, (j+1), title='Fold %s' %(j+1))\n",
    "            scatter(x[j], y[j], c=lab[j], s=100)\n",
    "        \n",
    "_=plot_folds(split_strategies, 4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Stratified splitting takes care that each fold has almost the same number of samples from each class. This is not the case with normal splitting which usually leads to imbalanced folds."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Toy example: Binary Support Vector Classification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Following the example from above, we will tune the performance of a SVM on the binary classification problem. We will\n",
    "\n",
    "* demonstrate how to evaluate a loss function or metric on a given algorithm\n",
    "* then learn how to estimate this metric for the algorithm performing on unseen data\n",
    "* and finally use those techniques to tune the parameters to obtain the best possible results.\n",
    "\n",
    "The involved methods are\n",
    "\n",
    " * [LibSVM](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CLibSVM.html) as the binary classification algorithms\n",
    " * the area under the ROC curve (AUC) as performance metric\n",
    " * three different kernels to compare"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# define SVM with a small rbf kernel (always normalise the kernel!)\n",
    "C=1\n",
    "kernel=GaussianKernel(2, 0.001)\n",
    "kernel.init(features, features)\n",
    "kernel.set_normalizer(SqrtDiagKernelNormalizer())\n",
    "classifier=LibSVM(C, kernel, labels)\n",
    "\n",
    "# train\n",
    "_=classifier.train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, we now have performed classification on the training data. How good did this work? We can easily do this for many different performance measures."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# instanciate a number of Shogun performance measures\n",
    "metrics=[ROCEvaluation(), AccuracyMeasure(), ErrorRateMeasure(), F1Measure(), PrecisionMeasure(), RecallMeasure(), SpecificityMeasure()]\n",
    "\n",
    "for metric in metrics:\n",
    "    print(metric.get_name(), metric.evaluate(classifier.apply(features), labels))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note how for example error rate is 1-accuracy. All of those numbers represent the training error, i.e. the ability of the classifier to explain the given data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, the training error is zero. This seems good at first. But is this setting of the parameters a good idea? No! A good performance on the training data alone does not mean anything. A simple look up table is able to produce zero error on training data. What we want is that our methods generalises the input data somehow to perform well on unseen data. We will now use cross-validation to estimate the performance on such.\n",
    "\n",
    " We will use [CStratifiedCrossValidationSplitting](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CStratifiedCrossValidationSplitting.html), which accepts a reference to the labels and the number of partitions as parameters. This instance is then passed to the class [CCrossValidation](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CCrossValidation.html), which does the estimation using the desired splitting strategy. The latter class can take all algorithms that are implemented against the [CMachine](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CMachine.html) interface."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "metric=AccuracyMeasure()\n",
    "cross=CrossValidation(classifier, features, labels, stratified_split, metric)\n",
    "\n",
    "# perform the cross-validation, note that this call involved a lot of computation\n",
    "result=cross.evaluate()\n",
    "\n",
    "# the result needs to be casted to CrossValidationResult\n",
    "result=CrossValidationResult.obtain_from_generic(result)\n",
    "\n",
    "# this class contains a field \"mean\" which contain the mean performance metric\n",
    "print(\"Testing\", metric.get_name(), result.get_mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now this is incredibly bad compared to the training error. In fact, it is very close to random performance (0.5). The lesson: Never judge your algorithms based on the performance on training data!\n",
    "\n",
    "Note that for small data sizes, the cross-validation estimates are quite noisy. If we run it multiple times, we get different results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(\"Testing\", metric.get_name(), [CrossValidationResult.obtain_from_generic(cross.evaluate()).get_mean() for _ in range(10)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is better to average a number of different runs of cross-validation in this case. A nice side effect of this is that the results can be used to estimate error intervals for a given confidence rate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 25 runs and 95% confidence intervals\n",
    "cross.set_num_runs(25)\n",
    "\n",
    "# perform x-validation (now even more expensive)\n",
    "cross.evaluate()\n",
    "result=cross.evaluate()\n",
    "result=CrossValidationResult.obtain_from_generic(result)\n",
    "\n",
    "print(\"Testing cross-validation mean %.2f \" \\\n",
    "% (result.get_mean()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using this machinery, it is very easy to compare multiple kernel parameters against each other to find the best one. It is even possible to compare a different kernel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "widths=2**linspace(-5,25,10)\n",
    "results=zeros(len(widths))\n",
    "\n",
    "for i in range(len(results)):\n",
    "    kernel.set_width(widths[i])\n",
    "    result=CrossValidationResult.obtain_from_generic(cross.evaluate())\n",
    "    results[i]=result.get_mean()\n",
    "    \n",
    "plot(log2(widths), results, 'blue')\n",
    "xlabel(\"log2 Kernel width\")\n",
    "ylabel(metric.get_name())\n",
    "_=title(\"Accuracy for different kernel widths\")\n",
    "\n",
    "print(\"Best Gaussian kernel width %.2f\" % widths[results.argmax()], \"gives\", results.max())\n",
    "\n",
    "# compare this with a linear kernel\n",
    "classifier.set_kernel(LinearKernel())\n",
    "lin_k=CrossValidationResult.obtain_from_generic(cross.evaluate())\n",
    "plot([log2(widths[0]), log2(widths[len(widths)-1])], [lin_k.get_mean(),lin_k.get_mean()], 'r')\n",
    "\n",
    "# please excuse this horrible code :)\n",
    "print(\"Linear kernel gives\", lin_k.get_mean())\n",
    "\n",
    "_=legend([\"Gaussian\", \"Linear\"], loc=\"lower center\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This gives a brute-force way to select paramters of any algorithm implemented under the [CMachine](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CMachine.html) interface. The cool thing about this is, that it is also possible to compare different model families against each other. Below, we compare a a number of regression models in Shogun on the Boston Housing dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Regression problem and cross-validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Various regression models in Shogun are now used to predict house prices using the [boston housing dataset](https://archive.ics.uci.edu/ml/datasets/Housing). Cross-validation is used to find best parameters and also test the performance of the models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "feats=RealFeatures(CSVFile(os.path.join(SHOGUN_DATA_DIR, 'uci/housing/fm_housing.dat')))\n",
    "labels=RegressionLabels(CSVFile(os.path.join(SHOGUN_DATA_DIR, 'uci/housing/housing_label.dat')))\n",
    "preproc=RescaleFeatures()\n",
    "preproc.init(feats)\n",
    "feats.add_preprocessor(preproc)\n",
    "feats.apply_preprocessor(True)\n",
    "\n",
    "#Regression models\n",
    "ls=LeastSquaresRegression(feats, labels)\n",
    "\n",
    "tau=1\n",
    "rr=LinearRidgeRegression(tau, feats, labels)\n",
    "\n",
    "width=1\n",
    "tau=1\n",
    "kernel=GaussianKernel(feats, feats, width)\n",
    "kernel.set_normalizer(SqrtDiagKernelNormalizer())\n",
    "krr=KernelRidgeRegression(tau, kernel, labels)\n",
    "\n",
    "\n",
    "regression_models=[ls, rr, krr]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us use cross-validation to compare various values of tau paramter for ridge regression ([Regression notebook](http://www.shogun-toolbox.org/static/notebook/current/Regression.html)). We will use [MeanSquaredError](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CMeanSquaredError.html) as the performance metric. Note that normal splitting is used since it might be impossible to generate \"good\" splits using Stratified splitting in case of regression since we have continous values for labels. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "n=30\n",
    "taus = logspace(-4, 1, n)\n",
    "\n",
    "#5-fold cross-validation\n",
    "k=5\n",
    "split=CrossValidationSplitting(labels, k)\n",
    "metric=MeanSquaredError()\n",
    "cross=CrossValidation(rr, feats, labels, split, metric)\n",
    "cross.set_num_runs(50)\n",
    "\n",
    "errors=[]\n",
    "for tau in taus:\n",
    "    #set necessary parameter\n",
    "    rr.set_tau(tau)\n",
    "    result=cross.evaluate()\n",
    "    result=CrossValidationResult.obtain_from_generic(result)\n",
    "    #Enlist mean error for all runs\n",
    "    errors.append(result.get_mean())\n",
    "\n",
    "figure(figsize=(20,6))\n",
    "suptitle(\"Finding best (tau) parameter using cross-validation\", fontsize=12)\n",
    "p=subplot(121)\n",
    "title(\"Ridge Regression\")\n",
    "plot(taus, errors, linewidth=3)\n",
    "p.set_xscale('log')\n",
    "p.set_ylim([0, 80])\n",
    "xlabel(\"Taus\")\n",
    "ylabel(\"Mean Squared Error\")\n",
    "\n",
    "cross=CrossValidation(krr, feats, labels, split, metric)\n",
    "cross.set_num_runs(50)\n",
    "\n",
    "errors=[]\n",
    "for tau in taus:\n",
    "    krr.set_tau(tau)\n",
    "    result=cross.evaluate()\n",
    "    result=CrossValidationResult.obtain_from_generic(result)\n",
    "    #print tau, \"error\", result.get_mean()\n",
    "    errors.append(result.get_mean())\n",
    "\n",
    "p2=subplot(122)\n",
    "title(\"Kernel Ridge regression\")\n",
    "plot(taus, errors, linewidth=3)\n",
    "p2.set_xscale('log')\n",
    "xlabel(\"Taus\")\n",
    "_=ylabel(\"Mean Squared Error\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A low value of error certifies a good pick for the tau paramter which should be easy to conclude from the plots. In case of Ridge Regression the value of tau i.e. the amount of regularization doesn't seem to matter but does seem to in case of Kernel Ridge Regression. One interpretation of this could be the lack of over fitting in the feature space for ridge regression and the occurence of over fitting in the new kernel space in which Kernel Ridge Regression operates. </br>   Next we will compare a range of values for the width of [Gaussian Kernel](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CGaussianKernel.html) used in [Kernel Ridge Regression](http://shogun-toolbox.org/doc/en/latest/classshogun_1_1CKernelRidgeRegression.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "n=50\n",
    "widths=logspace(-2, 3, n)\n",
    "krr.set_tau(0.1)\n",
    "metric=MeanSquaredError()\n",
    "k=5\n",
    "split=CrossValidationSplitting(labels, k)\n",
    "cross=CrossValidation(krr, feats, labels, split, metric)\n",
    "cross.set_num_runs(10)\n",
    "errors=[]\n",
    "for width in widths:\n",
    "    kernel.set_width(width)\n",
    "    result=cross.evaluate()\n",
    "    result=CrossValidationResult.obtain_from_generic(result)\n",
    "    #print width, \"error\", result.get_mean()\n",
    "    errors.append(result.get_mean())\n",
    "    \n",
    "figure(figsize=(15,5))\n",
    "p=subplot(121)\n",
    "title(\"Finding best width using cross-validation\")\n",
    "plot(widths, errors, linewidth=3)\n",
    "p.set_xscale('log')\n",
    "xlabel(\"Widths\")\n",
    "_=ylabel(\"Mean Squared Error\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The values for the kernel parameter and tau may not be independent of each other, so the values we have may not be optimal. A brute force way to do this would be to try all the pairs of these values but it is only feasible for a low number of parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "n=40\n",
    "taus = logspace(-3, 0, n)\n",
    "widths=logspace(-1, 4, n)\n",
    "\n",
    "cross=CrossValidation(krr, feats, labels, split, metric)\n",
    "cross.set_num_runs(1)\n",
    "\n",
    "x, y=meshgrid(taus, widths)\n",
    "grid=array((ravel(x), ravel(y)))\n",
    "print(grid.shape)\n",
    "\n",
    "errors=[]\n",
    "for i in range(0, n*n):\n",
    "    krr.set_tau(grid[:,i][0])\n",
    "    kernel.set_width(grid[:,i][1])\n",
    "    result=cross.evaluate()\n",
    "    result=CrossValidationResult.obtain_from_generic(result)\n",
    "    errors.append(result.get_mean())\n",
    "errors=array(errors).reshape((n, n))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from mpl_toolkits.mplot3d import Axes3D\n",
    "#taus = logspace(0.5, 1, n)\n",
    "jet()\n",
    "fig=figure(figsize(15,7))\n",
    "ax=subplot(121)\n",
    "c=pcolor(x, y, errors)\n",
    "_=contour(x, y, errors, linewidths=1, colors='black')\n",
    "_=colorbar(c)\n",
    "xlabel('Taus')\n",
    "ylabel('Widths')\n",
    "ax.set_xscale('log')\n",
    "ax.set_yscale('log')\n",
    "\n",
    "ax1=fig.add_subplot(122, projection='3d')\n",
    "ax1.plot_wireframe(log10(y),log10(x), errors, linewidths=2, alpha=0.6)\n",
    "ax1.view_init(30,-40)\n",
    "xlabel('Taus')\n",
    "ylabel('Widths')\n",
    "_=ax1.set_zlabel('Error')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us approximately pick the good parameters using the plots. Now that we have the best parameters, let us compare the various regression models on the data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#use the best parameters\n",
    "rr.set_tau(1)\n",
    "krr.set_tau(0.05)\n",
    "kernel.set_width(2)\n",
    "\n",
    "title_='Performance on Boston Housing dataset'\n",
    "print(\"%50s\" %title_)\n",
    "for machine in regression_models:\n",
    "    metric=MeanSquaredError()\n",
    "    cross=CrossValidation(machine, feats, labels, split, metric)\n",
    "    cross.set_num_runs(25)\n",
    "    result=cross.evaluate()\n",
    "    result=CrossValidationResult.obtain_from_generic(result)\n",
    "    print(\"-\"*80)\n",
    "    print(\"|\", \"%30s\" % machine.get_name(),\"|\", \"%20s\" %metric.get_name(),\"|\",\"%20s\" %result.get_mean() ,\"|\"  )\n",
    "print(\"-\"*80)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model selection using Grid Search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A standard way of selecting the best parameters of a learning algorithm is by Grid Search. This is done by an exhaustive search of a specified parameter space. [CModelSelectionParameters](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CModelSelectionParameters.html) is used to select various parameters and their ranges to be used for model selection. A tree like structure is used where the nodes can be [CSGObject](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CSGObject.html) or the parameters to the object. The range of values to be searched for the parameters is set using `build_values()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#Root\n",
    "param_tree_root=ModelSelectionParameters()\n",
    "\n",
    "#Parameter tau\n",
    "tau=ModelSelectionParameters(\"tau\")\n",
    "param_tree_root.append_child(tau)\n",
    "\n",
    "# also R_LINEAR/R_LOG is available as type\n",
    "min_value=0.01\n",
    "max_value=1\n",
    "type_=R_LINEAR\n",
    "step=0.05\n",
    "base=2\n",
    "tau.build_values(min_value, max_value, type_, step, base)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we will create [CModelSelectionParameters](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CModelSelectionParameters.html) instance with a kernel object which has to be appended the root node. The kernel object itself will be append with a kernel width parameter which is the parameter we wish to search."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#kernel object\n",
    "param_gaussian_kernel=ModelSelectionParameters(\"kernel\", kernel)\n",
    "gaussian_kernel_width=ModelSelectionParameters(\"log_width\")\n",
    "gaussian_kernel_width.build_values(0.1, 6.0, R_LINEAR, 0.5, 2.0)\n",
    "\n",
    "#kernel parameter \n",
    "param_gaussian_kernel.append_child(gaussian_kernel_width)\n",
    "param_tree_root.append_child(param_gaussian_kernel)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# cross validation instance used\n",
    "cross_validation=CrossValidation(krr, feats, labels, split, metric)\n",
    "cross_validation.set_num_runs(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# model selection instance\n",
    "model_selection=GridSearchModelSelection(cross_validation, param_tree_root)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print_state=False\n",
    "# TODO: enable it once crossval has been fixed\n",
    "#best_parameters=model_selection.select_model(print_state)\n",
    "\n",
    "#best_parameters.apply_to_machine(krr)\n",
    "#best_parameters.print_tree()\n",
    "result=cross_validation.evaluate()\n",
    "result=CrossValidationResult.obtain_from_generic(result)\n",
    "\n",
    "print('Error with Best parameters:', result.get_mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The error value using the parameters obtained from Grid Search is pretty close (and should be better) to the one we had seen in the last section. Grid search suffers from the [curse of dimensionality](http://en.wikipedia.org/wiki/Curse_of_dimensionality) though, which can lead to huge computation costs in higher dimensions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[1] Forman, G. and Scholz, M. (2009). Apples-to-apples in cross-validation studies: Pitfalls in classifier performance measurement. Technical report, HP Laboratories."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
